Using SQL primitives and parallel DB servers to speed up knowledge discovery in large relational databases

نویسندگان

  • Alex Alves Freitas
  • Simon H. Lavington
چکیده

Efficiency is crucial in KDD (Knowledge Discovery in Databases), due to the huge amount of data stored in commercial databases. We argue that high efficiency in KDD can be achieved by combining two approaches, namely mapping KDD functionality onto standard DBMS operations and executing KDD tasks on a parallel SQL server. We propose generic KDD primitives which underly the candidate-rule evaluation procedures of many KDD algorithms, and we evaluate the speed up achieved by a parallel SQL server when executing a decision-tree learner algorithm implemented via these primitives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Discovery in Spatial Databases

Both, the number and the size of spatial databases, such as geographic or medical databases, are rapidly growing because of the large amount of data obtained from satellite images, computer tomography or other scientific equipment. Knowledge discovery in databases (KDD) is the process of discovering valid, novel and potentially useful patterns from large databases. Typical tasks for knowledge d...

متن کامل

Towards Large-Scale Knowledge Discovery in Databases (KDD) by Exploiting Parallelism in Generic KDD Primitives

Efficiency and scalability are crucial issues in Knowledge Discovery in Databases (KDD). Our approach to these challenging issues is to devise generic, set-based KDD primitives which are insensitive to the order in which data elements are processed. Such primitives facilitate the exploitation of parallelism. Furthermore, these primitives are generic in that they support a wide selection of rule...

متن کامل

EquipAsso: An Algorithm based on New Relational Algebraic Operators for Association Rules Discovery

The task of search for interesting relationships among data has been always an research focus in data mining. The overall performance of mining association rules is determined by the discover the large itemsets, i.e., the sets of itemsets that have their support above a pre-determined minimum support . The algorithms proposed for association rules show different approaches to generate all large...

متن کامل

Fast Join Execution Using Summary Information in Large Databases

It is well known that a query execution in relational databases is not fast and join execution is generally the most expensive operation in query executions. All three major join methods, namely, nested loops, sorting, and hashing have been perfected to an extent that any further improvement in these methods will enhance the performance of join execution only marginally. Even with the advent of...

متن کامل

Query Languages Supporting Descriptive Rule Mining: A Comparative Study

Recently, inductive databases (IDBs) have been proposed to tackle the problem of knowledge discovery from huge databases. With an IDB, the user/analyst performs a set of very different operations on data using a query language, powerful enough to support all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing. We provide a comparison between thr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996